-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Fix hermes tool parser output error stream arguments in some cases (#10395) #10398
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
cc @K-Mistele |
The function partial_json_parser will complete the partrial JSON string {"name": "tool_name", "arguments": {"arg1": " to {"name": "tool_name", "arguments": {"arg1": ""}}, which will result in an error when calculating the incremental changes in the arguments field. |
Thanks everyone for making this potential fix! |
For the relevant code, please refer to the following link: #10395 |
Totally missed that, thanks a lot! |
Please fix the linter errors. |
Don't worry about DCO, we can pass it manually if you agree to it. |
… cases (vllm-project#10395) Signed-off-by: xiyuan lee <[email protected]>
I've fixed the linter errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience!
… cases (vllm-project#10395) (vllm-project#10398) Signed-off-by: xiyuan lee <[email protected]>
… cases (vllm-project#10395) (vllm-project#10398) Signed-off-by: xiyuan lee <[email protected]>
… cases (vllm-project#10395) (vllm-project#10398) Signed-off-by: xiyuan lee <[email protected]> Signed-off-by: Maxime Fournioux <[email protected]>
… cases (vllm-project#10395) (vllm-project#10398) Signed-off-by: xiyuan lee <[email protected]> Signed-off-by: rickyx <[email protected]>
I used the hermes_tool_parser.py as
Here is how I start vllm service:
|
… cases (vllm-project#10395) (vllm-project#10398) Signed-off-by: xiyuan lee <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]>
I use vllm0.6.3.post1 docker compose, and entrypoint is: entrypoint: ["/bin/sh", "-c", "python3 -u -m vllm.entrypoints.openai.api_server --model /data/models/Qwen2.5-72B-Instruct-AWQ --enable-auto-tool-choice --tool-call-parser hermes --tensor-parallel-size 2 --gpu_memory_utilization 0.97 --max_model_len 20000 --max_num_seq 40"]) |
… cases (vllm-project#10395) (vllm-project#10398) Signed-off-by: xiyuan lee <[email protected]>
[BugFix] Fix hermes tool parser output error stream arguments in some cases (#10395)
Use delta_text to return directly to the user, preventing errors caused by
partial_json_parser
during the incremental computation of parameters.FIX #9693
FIX #9908
FIX #10395